Memory Access Synchronization in Vector Multiprocessors
نویسندگان
چکیده
In vector multiprocessor systems, collisions in the interconnection network and conflicts in the memory modules are the main causes of the performance degradation. In this work we propose to synchronize the access to the memory system so that streams can be accessed with the minimum achievable latency if their elements are requested out of order. The mechanism uses a blockinterleaved storage scheme and works for strides belonging to the most common families of strides found in real programs. The hardware required is also described and its complexity is shown to be equivalent to the complexity of the address generator when the processors request the elements in order.
منابع مشابه
A New Synchronization Scheme for Memory Consistency Model ( Extended Abstract )
Modernistic scalable multiprocessors are mostly built with a distributed-shared memory architecture. Large scale shared memory multiprocessors have long memory latencies for the remote memory access. And these latencies can quickly offset system performance earned from the exploitation of parallelism. In order to improve system performance, we must reduce memory latencies. The useful way for th...
متن کاملQueue Locks on Cache Coherent Multiprocessors
Large-scale shared-memory multiprocessors typically have long latencies for remote data accesses. A key issue for execution performance of many common applications is the synchronization cost. The communication scalability of synchronization has been improved by the introduction of queue-based spin-locks instead of Test&(Test&Set). For architectures with long access latencies for global data, a...
متن کاملEnergy-Aware Microprocessor Synchronization: Transactional Memory vs. Locks
One important way in which multiprocessors differ from uniprocessors is in the need to provide programmers the ability to synchronize concurrent access to memory. Transactional memory was proposed as a way of improving throughput especially when the rate of synchronization conflict is low. In this paper we explore power implications of transactional memory on standard and synthetic benchmarks. ...
متن کاملEecient Software Synchronization on Large Cache Coherent Multiprocessors
Large-scale shared-memory multiprocessors typically have long latencies for remote data accesses. A key issue for execution performance of many common applications is the synchronization cost. The communication scalability of synchronization has been improved by the introduction of queue-based spin-locks instead of Test&(Test&Set). For architectures with long access latencies for global data, a...
متن کامل